Autonomic Runtime Manager for Large Scale Adaptive Distributed Applications
نویسندگان
چکیده
Large-scale distributed applications are highly adaptive and heterogeneous in terms of their computational requirements. The computational complexity associated with each computational region or domain varies continuously and dramatically both in space and time throughout the whole life cycle of the application execution. Consequently, static scheduling techniques are inefficient to optimize the execution of these applications at runtime. In this paper, we present an Autonomic Runtime Manager (ARM) that uses the application spatial and temporal characteristics as the main criteria to selfoptimize the execution of distributed applications at runtime. The wildfire spread simulation is used as a running example to demonstrate the ARM effectiveness to control and manage the application’s execution. The behavior of the wildfire simulation depends on many complex factors that contribute to the adaptive and heterogeneous behaviors such as fuel characteristics and configurations, chemical reactions, balances between different modes of heat transfer, topography, and fire/atmosphere interactions. Consequently, the application execution cannot be predicted a priori and that makes static parallel or distributed algorithms very inefficient. The ARM is implemented using two modules: 1) Online Monitoring and Analysis Module, and 2) Autonomic Planning and Scheduling Module. The online monitoring and analysis module interfaces with different kinds of application and system sensors that collect information to accurately determine the current state of the fire simulation in terms of the number and locations of burning and unburned cells as well as the states of the resources, and decides whether the autonomic planning and scheduling module should be invoked. The autonomic planning and scheduling module uses the resource capability models as well as the current state of the computations to repartition the whole computational workload into available processors. Our experimental results show that by using ARM the performance of the wildfire simulation has been improved by 45% when compared with a static partitioning algorithm. We also evaluate the performance of ARM using two partitioning strategies. One approach is to partition the wildfire simulation domain into Natural Regions (NR), where each region has the same temporal and spatial characteristics (e.g., burned (NR1), burning (NR2), and unburned regions (NR3)), and schedule each region into available processors. The second approach is to view the wildfire domain as a graph and use a graph partitioning tool (e.g., ParMetis tool) to partition the graph into different domains.
منابع مشابه
Towards autonomic application-sensitive partitioning for SAMR applications
Distributed structured adaptive mesh refinement (SAMR) techniques offer the potential for accurate and cost-effective solutions of physically realistic models of complex physical phenomena. However, the heterogeneous and dynamic nature of SAMR applications results in significant runtime management challenges. This paper investigates autonomic application-sensitive SAMR runtime management strate...
متن کاملAdaptive Runtime Management of Spatial and Temporal Heterogeneity for Dynamic Grid Applications
This paper addresses the runtime management of spatial and temporal heterogeneity in both, scientific applications and geographically distributed resources in Grid computing environments. The targeted applications are large-scale dynamic Grid applications which require large amount of computational resources typically spanning multiple sites and exhibit very long execution times. An adaptive ru...
متن کاملMobile Object Layer: A Runtime Substrate for Parallel Adaptive and Irregular Computations
In this paper we present a parallel runtime substrate, the Mobile Object Layer (MOL), that supports data or object mobility and automatic message forwarding in order to ease the implementation of adaptive and irregular applications on distributed memory machines. The MOL implements a global logical name space for message passing and distributed directories to assist in the translation of logica...
متن کاملTowards fully autonomic peer-to-peer systems
Large-scale distributed applications are becoming more and more demanding in terms of efficiency and flexibility of the technological infrastructure, for which traditional solutions based on the client/server paradigm are not suitable. The peer-to-peer paradigm provides an appealing solution to this problem, allowing to deploy robust networks of collectors, providers and consumers of resources....
متن کاملIntroduction: Enabling Large-Scale Computational Science—Motivations, Requirements, and Challenges
The exponential growth in computing, networking, and storage technologies has ushered in unprecedented opportunities for parallel and distributed computing research and applications. In the meantime, emerging large-scale adaptive scientific and engineering applications are requiring an increasing amount of computing and storage resources to provide new insights into complex systems. Furthermore...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005